Graphing Data from our final product

By Minh Phan

The first five notebooks demonstrates the process of obtaining and combining data into our final dataset product that we use to feed in our machine learning models. This tutorial, therefore, serves to guide you through some of the fundamental functions you may use while exploring the data.

This tutorial utilizes the Xarray library. You can also use the Zarr library which is specialized in handling this type of dataset. Xarray is more well-known in the Python developer community, hence the preference. It also seamlessly handles conversion with other popular data libraries like Pandas, NumPy, and Dask.

Import necessary libraries

! pip install cmocean
Collecting cmocean
  Downloading cmocean-3.0.3-py3-none-any.whl (222 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 222.1/222.1 kB 9.2 MB/s eta 0:00:00
Requirement already satisfied: matplotlib in /srv/conda/envs/notebook/lib/python3.9/site-packages (from cmocean) (3.7.1)
Requirement already satisfied: numpy in /srv/conda/envs/notebook/lib/python3.9/site-packages (from cmocean) (1.23.5)
Requirement already satisfied: packaging in /srv/conda/envs/notebook/lib/python3.9/site-packages (from cmocean) (23.1)
Requirement already satisfied: contourpy>=1.0.1 in /srv/conda/envs/notebook/lib/python3.9/site-packages (from matplotlib->cmocean) (1.0.7)
Requirement already satisfied: cycler>=0.10 in /srv/conda/envs/notebook/lib/python3.9/site-packages (from matplotlib->cmocean) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /srv/conda/envs/notebook/lib/python3.9/site-packages (from matplotlib->cmocean) (4.39.4)
Requirement already satisfied: kiwisolver>=1.0.1 in /srv/conda/envs/notebook/lib/python3.9/site-packages (from matplotlib->cmocean) (1.4.4)
Requirement already satisfied: pillow>=6.2.0 in /srv/conda/envs/notebook/lib/python3.9/site-packages (from matplotlib->cmocean) (9.5.0)
Requirement already satisfied: pyparsing>=2.3.1 in /srv/conda/envs/notebook/lib/python3.9/site-packages (from matplotlib->cmocean) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in /srv/conda/envs/notebook/lib/python3.9/site-packages (from matplotlib->cmocean) (2.8.2)
Requirement already satisfied: importlib-resources>=3.2.0 in /srv/conda/envs/notebook/lib/python3.9/site-packages (from matplotlib->cmocean) (5.12.0)
Requirement already satisfied: zipp>=3.1.0 in /srv/conda/envs/notebook/lib/python3.9/site-packages (from importlib-resources>=3.2.0->matplotlib->cmocean) (3.15.0)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib->cmocean) (1.16.0)
Installing collected packages: cmocean
Successfully installed cmocean-3.0.3
! conda install --channel conda-forge pygmt -y
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 22.9.0
  latest version: 23.5.2

Please update conda by running

    $ conda update -n base -c conda-forge conda



## Package Plan ##

  environment location: /srv/conda/envs/notebook

  added / updated specs:
    - pygmt


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    dcw-gmt-2.1.1              |       ha770c72_0        21.0 MB  conda-forge
    fftw-3.3.10                |nompi_hc118613_108         1.9 MB  conda-forge
    ghostscript-9.54.0         |       h27087fc_2        57.1 MB  conda-forge
    gmt-6.4.0                  |      h4733502_10        31.0 MB  conda-forge
    gshhg-gmt-2.3.7            |    ha770c72_1003        54.6 MB  conda-forge
    openssl-3.1.1              |       hd590300_1         2.5 MB  conda-forge
    pcre-8.45                  |       h9c3ff4c_0         253 KB  conda-forge
    pygmt-0.9.0                |     pyhd8ed1ab_0         225 KB  conda-forge
    ------------------------------------------------------------
                                           Total:       168.6 MB

The following NEW packages will be INSTALLED:

  dcw-gmt            conda-forge/linux-64::dcw-gmt-2.1.1-ha770c72_0 None
  fftw               conda-forge/linux-64::fftw-3.3.10-nompi_hc118613_108 None
  ghostscript        conda-forge/linux-64::ghostscript-9.54.0-h27087fc_2 None
  gmt                conda-forge/linux-64::gmt-6.4.0-h4733502_10 None
  gshhg-gmt          conda-forge/linux-64::gshhg-gmt-2.3.7-ha770c72_1003 None
  pcre               conda-forge/linux-64::pcre-8.45-h9c3ff4c_0 None
  pygmt              conda-forge/noarch::pygmt-0.9.0-pyhd8ed1ab_0 None

The following packages will be UPDATED:

  openssl                                  3.1.0-hd590300_3 --> 3.1.1-hd590300_1 None



Downloading and Extracting Packages
fftw-3.3.10          | 1.9 MB    | ##################################### | 100% 
gmt-6.4.0            | 31.0 MB   | ##################################### | 100% 
ghostscript-9.54.0   | 57.1 MB   | ##################################### | 100% 
pygmt-0.9.0          | 225 KB    | ##################################### | 100% 
gshhg-gmt-2.3.7      | 54.6 MB   | ##################################### | 100% 
dcw-gmt-2.1.1        | 21.0 MB   | ##################################### | 100% 
openssl-3.1.1        | 2.5 MB    | ##################################### | 100% 
pcre-8.45            | 253 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Retrieving notices: ...working... done
import xarray as xr
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import holoviews as hv
import hvplot.xarray
import cmocean
import cartopy
import pygmt

Read data

# Change the file path accordingly
ds = xr.open_zarr('../data/FINAL_IO_025GRID_DAILY.zarr/')

# unsorted time, will fix it later
ds = ds.sortby('time')

Slicing data

ds
<xarray.Dataset>
Dimensions:          (time: 8523, lat: 177, lon: 241)
Coordinates:
  * lat              (lat) float32 32.0 31.75 31.5 31.25 ... -11.5 -11.75 -12.0
  * lon              (lon) float32 42.0 42.25 42.5 42.75 ... 101.5 101.8 102.0
  * time             (time) datetime64[ns] 1997-09-01 1997-09-02 ... 2020-12-31
Data variables: (12/14)
    CHL              (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
    CHL_uncertainty  (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
    adt              (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
    air_temp         (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
    direction        (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
    sla              (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
    ...               ...
    u_curr           (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
    u_wind           (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
    ug_curr          (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
    v_curr           (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
    v_wind           (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
    vg_curr          (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
Attributes: (12/17)
    creator_email:              minhphan@uw.edu
    creator_name:               Minh Phan
    creator_type:               person
    date_created:               2023-07-07
    geospatial_lat_max:         32.0
    geospatial_lat_min:         -12.0
    ...                         ...
    geospatial_lon_units:       degrees_east
    source:                     OSCAR, ERA5 Reanalysis, Copernicus Climate Ch...
    summary:                    Daily mean of 0.25 x 0.25 degrees gridded dat...
    time_coverage_end:          2020-12-31T23:59:59
    time_coverage_start:        2000-01-01T00:00:00
    title:                      Climate Data for Coastal Upwelling Machine Le...

We can slice data by the dimensions (latitude, longitude, time) and data variables.

# slice by latitude
# notice how we specify the range in reverse
ds.sel(lat=slice(0, -12))
<xarray.Dataset>
Dimensions:          (time: 8523, lat: 49, lon: 241)
Coordinates:
  * lat              (lat) float32 0.0 -0.25 -0.5 -0.75 ... -11.5 -11.75 -12.0
  * lon              (lon) float32 42.0 42.25 42.5 42.75 ... 101.5 101.8 102.0
  * time             (time) datetime64[ns] 1997-09-01 1997-09-02 ... 2020-12-31
Data variables: (12/14)
    CHL              (time, lat, lon) float32 dask.array<chunksize=(29, 49, 241), meta=np.ndarray>
    CHL_uncertainty  (time, lat, lon) float32 dask.array<chunksize=(29, 49, 241), meta=np.ndarray>
    adt              (time, lat, lon) float32 dask.array<chunksize=(29, 49, 241), meta=np.ndarray>
    air_temp         (time, lat, lon) float32 dask.array<chunksize=(29, 49, 241), meta=np.ndarray>
    direction        (time, lat, lon) float32 dask.array<chunksize=(29, 49, 241), meta=np.ndarray>
    sla              (time, lat, lon) float32 dask.array<chunksize=(29, 49, 241), meta=np.ndarray>
    ...               ...
    u_curr           (time, lat, lon) float32 dask.array<chunksize=(29, 49, 241), meta=np.ndarray>
    u_wind           (time, lat, lon) float32 dask.array<chunksize=(29, 49, 241), meta=np.ndarray>
    ug_curr          (time, lat, lon) float32 dask.array<chunksize=(29, 49, 241), meta=np.ndarray>
    v_curr           (time, lat, lon) float32 dask.array<chunksize=(29, 49, 241), meta=np.ndarray>
    v_wind           (time, lat, lon) float32 dask.array<chunksize=(29, 49, 241), meta=np.ndarray>
    vg_curr          (time, lat, lon) float32 dask.array<chunksize=(29, 49, 241), meta=np.ndarray>
Attributes: (12/17)
    creator_email:              minhphan@uw.edu
    creator_name:               Minh Phan
    creator_type:               person
    date_created:               2023-07-07
    geospatial_lat_max:         32.0
    geospatial_lat_min:         -12.0
    ...                         ...
    geospatial_lon_units:       degrees_east
    source:                     OSCAR, ERA5 Reanalysis, Copernicus Climate Ch...
    summary:                    Daily mean of 0.25 x 0.25 degrees gridded dat...
    time_coverage_end:          2020-12-31T23:59:59
    time_coverage_start:        2000-01-01T00:00:00
    title:                      Climate Data for Coastal Upwelling Machine Le...
# slice by longitude
ds.sel(lon=slice(42, 45))
<xarray.Dataset>
Dimensions:          (time: 8523, lat: 177, lon: 13)
Coordinates:
  * lat              (lat) float32 32.0 31.75 31.5 31.25 ... -11.5 -11.75 -12.0
  * lon              (lon) float32 42.0 42.25 42.5 42.75 ... 44.5 44.75 45.0
  * time             (time) datetime64[ns] 1997-09-01 1997-09-02 ... 2020-12-31
Data variables: (12/14)
    CHL              (time, lat, lon) float32 dask.array<chunksize=(29, 177, 13), meta=np.ndarray>
    CHL_uncertainty  (time, lat, lon) float32 dask.array<chunksize=(29, 177, 13), meta=np.ndarray>
    adt              (time, lat, lon) float32 dask.array<chunksize=(29, 177, 13), meta=np.ndarray>
    air_temp         (time, lat, lon) float32 dask.array<chunksize=(29, 177, 13), meta=np.ndarray>
    direction        (time, lat, lon) float32 dask.array<chunksize=(29, 177, 13), meta=np.ndarray>
    sla              (time, lat, lon) float32 dask.array<chunksize=(29, 177, 13), meta=np.ndarray>
    ...               ...
    u_curr           (time, lat, lon) float32 dask.array<chunksize=(29, 177, 13), meta=np.ndarray>
    u_wind           (time, lat, lon) float32 dask.array<chunksize=(29, 177, 13), meta=np.ndarray>
    ug_curr          (time, lat, lon) float32 dask.array<chunksize=(29, 177, 13), meta=np.ndarray>
    v_curr           (time, lat, lon) float32 dask.array<chunksize=(29, 177, 13), meta=np.ndarray>
    v_wind           (time, lat, lon) float32 dask.array<chunksize=(29, 177, 13), meta=np.ndarray>
    vg_curr          (time, lat, lon) float32 dask.array<chunksize=(29, 177, 13), meta=np.ndarray>
Attributes: (12/17)
    creator_email:              minhphan@uw.edu
    creator_name:               Minh Phan
    creator_type:               person
    date_created:               2023-07-07
    geospatial_lat_max:         32.0
    geospatial_lat_min:         -12.0
    ...                         ...
    geospatial_lon_units:       degrees_east
    source:                     OSCAR, ERA5 Reanalysis, Copernicus Climate Ch...
    summary:                    Daily mean of 0.25 x 0.25 degrees gridded dat...
    time_coverage_end:          2020-12-31T23:59:59
    time_coverage_start:        2000-01-01T00:00:00
    title:                      Climate Data for Coastal Upwelling Machine Le...
# slice by time
ds.sel(time=slice('1998', '1999'))
<xarray.Dataset>
Dimensions:          (time: 730, lat: 177, lon: 241)
Coordinates:
  * lat              (lat) float32 32.0 31.75 31.5 31.25 ... -11.5 -11.75 -12.0
  * lon              (lon) float32 42.0 42.25 42.5 42.75 ... 101.5 101.8 102.0
  * time             (time) datetime64[ns] 1998-01-01 1998-01-02 ... 1999-12-31
Data variables: (12/14)
    CHL              (time, lat, lon) float32 dask.array<chunksize=(7, 177, 241), meta=np.ndarray>
    CHL_uncertainty  (time, lat, lon) float32 dask.array<chunksize=(7, 177, 241), meta=np.ndarray>
    adt              (time, lat, lon) float32 dask.array<chunksize=(7, 177, 241), meta=np.ndarray>
    air_temp         (time, lat, lon) float32 dask.array<chunksize=(7, 177, 241), meta=np.ndarray>
    direction        (time, lat, lon) float32 dask.array<chunksize=(7, 177, 241), meta=np.ndarray>
    sla              (time, lat, lon) float32 dask.array<chunksize=(7, 177, 241), meta=np.ndarray>
    ...               ...
    u_curr           (time, lat, lon) float32 dask.array<chunksize=(7, 177, 241), meta=np.ndarray>
    u_wind           (time, lat, lon) float32 dask.array<chunksize=(7, 177, 241), meta=np.ndarray>
    ug_curr          (time, lat, lon) float32 dask.array<chunksize=(7, 177, 241), meta=np.ndarray>
    v_curr           (time, lat, lon) float32 dask.array<chunksize=(7, 177, 241), meta=np.ndarray>
    v_wind           (time, lat, lon) float32 dask.array<chunksize=(7, 177, 241), meta=np.ndarray>
    vg_curr          (time, lat, lon) float32 dask.array<chunksize=(7, 177, 241), meta=np.ndarray>
Attributes: (12/17)
    creator_email:              minhphan@uw.edu
    creator_name:               Minh Phan
    creator_type:               person
    date_created:               2023-07-07
    geospatial_lat_max:         32.0
    geospatial_lat_min:         -12.0
    ...                         ...
    geospatial_lon_units:       degrees_east
    source:                     OSCAR, ERA5 Reanalysis, Copernicus Climate Ch...
    summary:                    Daily mean of 0.25 x 0.25 degrees gridded dat...
    time_coverage_end:          2020-12-31T23:59:59
    time_coverage_start:        2000-01-01T00:00:00
    title:                      Climate Data for Coastal Upwelling Machine Le...
# slice by variable
ds[['u_curr', 'u_wind']]
<xarray.Dataset>
Dimensions:  (time: 8523, lat: 177, lon: 241)
Coordinates:
  * lat      (lat) float32 32.0 31.75 31.5 31.25 ... -11.25 -11.5 -11.75 -12.0
  * lon      (lon) float32 42.0 42.25 42.5 42.75 ... 101.2 101.5 101.8 102.0
  * time     (time) datetime64[ns] 1997-09-01 1997-09-02 ... 2020-12-31
Data variables:
    u_curr   (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
    u_wind   (time, lat, lon) float32 dask.array<chunksize=(29, 177, 241), meta=np.ndarray>
Attributes: (12/17)
    creator_email:              minhphan@uw.edu
    creator_name:               Minh Phan
    creator_type:               person
    date_created:               2023-07-07
    geospatial_lat_max:         32.0
    geospatial_lat_min:         -12.0
    ...                         ...
    geospatial_lon_units:       degrees_east
    source:                     OSCAR, ERA5 Reanalysis, Copernicus Climate Ch...
    summary:                    Daily mean of 0.25 x 0.25 degrees gridded dat...
    time_coverage_end:          2020-12-31T23:59:59
    time_coverage_start:        2000-01-01T00:00:00
    title:                      Climate Data for Coastal Upwelling Machine Le...
# combine multiple slicing options all at once
ds[['u_curr', 'u_wind']].sel(time=slice('1998', '1999'), 
                             lat=slice(0, -12), 
                             lon=slice(42, 45))
<xarray.Dataset>
Dimensions:  (time: 730, lat: 49, lon: 13)
Coordinates:
  * lat      (lat) float32 0.0 -0.25 -0.5 -0.75 ... -11.25 -11.5 -11.75 -12.0
  * lon      (lon) float32 42.0 42.25 42.5 42.75 43.0 ... 44.25 44.5 44.75 45.0
  * time     (time) datetime64[ns] 1998-01-01 1998-01-02 ... 1999-12-31
Data variables:
    u_curr   (time, lat, lon) float32 dask.array<chunksize=(7, 49, 13), meta=np.ndarray>
    u_wind   (time, lat, lon) float32 dask.array<chunksize=(7, 49, 13), meta=np.ndarray>
Attributes: (12/17)
    creator_email:              minhphan@uw.edu
    creator_name:               Minh Phan
    creator_type:               person
    date_created:               2023-07-07
    geospatial_lat_max:         32.0
    geospatial_lat_min:         -12.0
    ...                         ...
    geospatial_lon_units:       degrees_east
    source:                     OSCAR, ERA5 Reanalysis, Copernicus Climate Ch...
    summary:                    Daily mean of 0.25 x 0.25 degrees gridded dat...
    time_coverage_end:          2020-12-31T23:59:59
    time_coverage_start:        2000-01-01T00:00:00
    title:                      Climate Data for Coastal Upwelling Machine Le...

Basic data graphing

We can also graph the data right from slicing, especially heatmaps from 2D arrays, or line charts. This is especially useful when we want to inspect elements on the go.

# make sure that the array you slice for a heatmap visualization is a 2D array
heatmap_arr = ds['speed'].sel(time='2000-01-02')
heatmap_arr
<xarray.DataArray 'speed' (lat: 177, lon: 241)>
dask.array<getitem, shape=(177, 241), dtype=float32, chunksize=(177, 241), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float32 32.0 31.75 31.5 31.25 ... -11.25 -11.5 -11.75 -12.0
  * lon      (lon) float32 42.0 42.25 42.5 42.75 ... 101.2 101.5 101.8 102.0
    time     datetime64[ns] 2000-01-02
Attributes:
    long_name:  10 metre absolute speed
    units:      m s**-1

2D array graphing with matplotlib

By default, graphing arrays with Xarray is done using matplotlib.

For 2D arrays we have multiple options to choose for our graphs

Heatmaps

heatmap_arr.plot.imshow()
<matplotlib.image.AxesImage at 0x7f55c4c574c0>

Contour maps

# contour map with no filling
heatmap_arr.plot.contour()
<matplotlib.contour.QuadContourSet at 0x7f55c41ead60>

# contour map with color filling
heatmap_arr.plot.contourf()
<matplotlib.contour.QuadContourSet at 0x7f55c40b8a90>

Surface plot

heatmap_arr.plot.surface()
<mpl_toolkits.mplot3d.art3d.Poly3DCollection at 0x7f55b4f46fd0>

1D array graphing with Matplotlib

We can also plot line graphs if our data is presented in 1D arrays. For the example below, we average our wind speed over the whole area and plot it over time. Matplotlib automatically infers what graph to create if you don’t explicitly call which one to. This piece of code below is a demonstration:

ds['speed'].mean(dim=['lat', 'lon']).plot(figsize=(10, 5))

We can add in parameters to customize our graphs, as additional arguments are passed to the underlying matplotlib plot() function.

ds['air_temp'].mean(dim=['lat', 'lon']).sel(time=slice('2007', '2009')).plot.line('r-o', figsize=(10,5), markersize=1)

# creating a new Axe object if there is no currently
# available one
ax = plt.gca() 
ds['direction'].plot.hist(ax = ax)
ax.set_xlabel('10 metre wind direction (degrees east)')
ax.set_ylabel('frequency')
ax.set_title('Daily average wind direction distribution over covered area (1979-2022)')
Text(0.5, 1.0, 'Daily average wind direction distribution over covered area (1979-2022)')

Graphing with holoviews

Xarray also support graphing using holoviews if you want interactive visualizations.

# scroll along the axes to stretch the graph
heatmap_arr.hvplot().options(cmap='bgy', width=600, height=500)
# resampling is having issues, so this is a temporary solution using Pandas 
# to perform resampling by month on one range...

# choose a small range just in case overloadding happens...

ds_to_resample = ds['CHL'].sel(lat=slice(10, 5), lon=slice(75, 80)).to_dataframe()
df_resampled = ds_to_resample.groupby([pd.Grouper(freq='1M', level='time'),
                        pd.Grouper(level='lon'),
                        pd.Grouper(level='lat')]).mean()
ds_resampled = xr.Dataset.from_dataframe(df_resampled)

We can see that after resampling, our time dimension size is reduced from 8523 (days) to 280 (months). Resampling is successful!

# original ds_to_resample object before converting to dataframe
ds['CHL'].sel(lat=slice(10, 5), lon=slice(75, 80))
<xarray.DataArray 'CHL' (time: 8523, lat: 21, lon: 21)>
dask.array<getitem, shape=(8523, 21, 21), dtype=float32, chunksize=(100, 21, 21), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float32 10.0 9.75 9.5 9.25 9.0 8.75 ... 6.0 5.75 5.5 5.25 5.0
  * lon      (lon) float32 75.0 75.25 75.5 75.75 76.0 ... 79.25 79.5 79.75 80.0
  * time     (time) datetime64[ns] 1997-09-01 1997-09-02 ... 2020-12-31
Attributes:
    _ChunkSizes:                [1, 256, 256]
    ancillary_variables:        flags CHL_uncertainty
    coverage_content_type:      modelResult
    input_files_reprocessings:  Processors versions: MODIS R2022.0NRT/VIIRSN ...
    long_name:                  Chlorophyll-a concentration - Mean of the bin...
    standard_name:              mass_concentration_of_chlorophyll_a_in_sea_water
    type:                       surface
    units:                      milligram m-3
    valid_max:                  1000.0
    valid_min:                  0.0
ds_resampled
<xarray.Dataset>
Dimensions:  (time: 280, lon: 21, lat: 21)
Coordinates:
  * time     (time) datetime64[ns] 1997-09-30 1997-10-31 ... 2020-12-31
  * lon      (lon) float32 75.0 75.25 75.5 75.75 76.0 ... 79.25 79.5 79.75 80.0
  * lat      (lat) float32 5.0 5.25 5.5 5.75 6.0 6.25 ... 9.0 9.25 9.5 9.75 10.0
Data variables:
    CHL      (time, lon, lat) float32 0.1385 0.1499 0.1564 ... 3.708 nan 1.711

… and of course we can graph this data, too!

CHL_month = ds_resampled.mean(dim=['lat', 'lon']).hvplot(label='monthly resampling').options(color='red', )
CHL_day = ds['CHL'].sel(lat=slice(10, 5), lon=slice(75, 80)).mean(dim=['lat', 'lon']).hvplot(label='daily resampling').options(color='blue')
(CHL_day*CHL_month).options(title='Monthly vs Daily resampling of chlorophyll-a levels', xlabel='year')

Contour mapping

Sometimes we want to identify areas in which our data may behave differently. One of the best ways to do this is to create contour maps.

Coastline mapping

Coastlines can help us separate land and ocean area. We can plot the coastline for our region of interest without using our data.

geo_axes = plt.axes(projection=cartopy.crs.PlateCarree())
# add stock image
# geo_axes.stock_img()
geo_axes.set_extent([42, 102, -12, 32])
geo_axes.coastlines()
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f075b7164f0>

We can also apply the coastline onto our data. The example below shows a heatmap with the coastline drawn on top of it

topo = xr.open_dataset('../data/finalized/topography.nc')
fig, ax = plt.subplots(subplot_kw=dict(projection=cartopy.crs.PlateCarree()))
topo.topo.plot.imshow(ax=ax)
ax.add_feature(cartopy.feature.COASTLINE)
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f17db906580>

Topography mapping

We also have a topography dataset that spans the same grid as our final dataset. At the moment of writing this moment, this dataset has not been incoporated into the latter one yet, as you have seen earlier in this notebook that we have to import it separately.

We can use this dataset to visualize the topography through contour maps.

topo
<xarray.Dataset>
Dimensions:  (lon: 241, lat: 177)
Coordinates:
  * lon      (lon) float32 42.0 42.25 42.5 42.75 ... 101.2 101.5 101.8 102.0
  * lat      (lat) float32 32.0 31.75 31.5 31.25 ... -11.25 -11.5 -11.75 -12.0
Data variables:
    topo     (lat, lon) float64 349.0 307.0 292.0 ... -3.285e+03 -3.727e+03
fig, (ax1, ax2) = plt.subplots(2,1, figsize=(8, 12))
topo_contour = topo.topo.plot.contour(levels=12, ax=ax1, cmap='coolwarm')
topo_contourf = topo.topo.plot.contourf(levels=12, ax=ax2, cmap='coolwarm', label='')
# add label in line
# ax.clabel(topo_contour, inline=1, fontsize=10)

hue1, color1 = topo_contour.legend_elements()
hue2, color2 = topo_contourf.legend_elements()
ax1.legend(hue1, color1, loc='center left', bbox_to_anchor=(1, 0.5))
ax1.set_aspect('equal')
ax2.set_aspect('equal')
# ax2.get_legend().set_visible(False)

Exporting the contour lines to graph on other datasets:

# extract lines from contour paths
lines = []
for contour_path in topo_contour.collections:
    for path in contour_path.get_paths():
        lines.extend(path.to_polygons())

In the example below, we graph the contour lines back onto the original contour maps.

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 7))
topo.topo.plot.imshow(ax=ax1)
for contour in topo_contour.collections[2:9]:
    for path in contour.get_paths():
        for line in path.to_polygons():
            ax1.scatter(line[:,0], line[:, 1], color='black', s=0.1) # graph some of the contours (as dots)

topo.topo.plot.imshow(ax=ax2)
for line in lines:
    ax2.scatter(line[:,0], line[:, 1], color='black', s=0.5) # graph all contours (as dots)
    
ax1.set_aspect('equal')
ax2.set_aspect('equal')

# very weird lines
ax = plt.gca()
topo.topo.plot.imshow(ax=ax)
for line in lines[:1200]: # there is an order of the lines sorted by topography height
    ax.plot(line[:,0], line[:, 1], color='black', markersize=0.1)

Creating topography contour maps with pygmt

Because of the weird lines caused by the connection of the open-ended paths, we want to experiment with some other packages.

# Load Earth relief grids (topography and bathymetry) in various resolutions.
# get contour grid
grid = pygmt.datasets.load_earth_relief(resolution="15m", region=[42, 102, -12, 32])
fig = pygmt.Figure()
# add the heatmap first
fig.grdimage(topo.topo, region='42/102/-12/32', cmap='topo', projection='M4i')
# and then the contour map on top of it
fig.grdcontour(grid=grid,
              annotation='1000+f6p', # font 6p
              interval=1000,
              limit=[-5000, 0],
              projection='M4i',
              pen="a0.15p") # 0.15 point
fig.show()

# coastline only
fig2 = pygmt.Figure()
fig2.grdimage(topo.topo, region='42/102/-12/32', cmap='topo', projection='M4i')
fig2.coast(
    region='42/102/-12/32',
    projection='M4i',
    shorelines=True
)
fig2.colorbar()
fig2.show()

# wind direction
grid = pygmt.datasets.load_earth_relief(resolution="15m", region=[42, 102, -12, 32])

fig3 = pygmt.Figure()
fig3.grdimage(ds.v_wind.isel(time=0), region='42/102/-12/32', projection='M4i')
fig3.grdcontour(grid=grid,
              annotation='1000+f6p', # font 6p
              interval=1000,
              limit=[-5000, 0],
              projection='M4i',
              pen="a0.15p") # 0.15 point
fig3.coast(
    region='42/102/-12/32',
    projection='M4i',
    shorelines="1p,black,solid"
)
fig3.colorbar()
fig3.show()

# wind direction
fig3 = pygmt.Figure()
fig3.grdimage(ds.direction.isel(time=0), region='42/102/-12/32', projection='M4i')
fig3.coast(
    region='42/102/-12/32',
    projection='M4i',
    shorelines=True
)
fig3.colorbar()
fig3.show()
fig, ax = plt.subplots(subplot_kw=dict(projection=cartopy.crs.PlateCarree()))
ds.direction.isel(time=0).plot.imshow(ax=ax)
ax.set_aspect('equal')
ax.add_feature(cartopy.feature.COASTLINE)
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f182e0d1a00>